Generalized Entropy for Splitting on Numerical Attributes in Decision Trees
نویسندگان
چکیده
Decision Trees are well known for their training efficiency and their interpretable knowledge representation. They apply a greedy search and a divide-and-conquer approach to learn patterns. The greedy search is based on the evaluation criterion on the candidate splits at each node. Although research has been performed on various such criteria, there is no significant improvement from the classical split approaches introduced in the early decision tree literature. This paper presents a new evaluation rule to determine candidate splits in decision tree classifiers. The experiments show that this new evaluation rule reduces the size of the resulting tree, while maintaining the tree’s accuracy.
منابع مشابه
Enhancing Network Intrusion Classification through the Kolmogorov-smirnov Splitting Criterion
Our investigation aims at detecting network intrusions using decision tree algorithms. Large differences in prior class probabilities of intrusion data have been reported to hinder the performance of decision trees. We propose to replace the Shannon entropy used in tree induction algorithms with a Kolmogorov Smirnov splitting criterion which locates a Bayes optimal cutpoint of attributes. The K...
متن کاملSplitting Methods for Decision Tree Induction:a Comparison of Two Families
Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the conditional entropy family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملGeneral and Eecient Multisplitting of Numerical Attributes
Often in supervised learning numerical attributes require special treatment and do not t the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be deened ...
متن کاملConstructing E cient Decision Trees by Using Optimized Numeric Association Rules
We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be ine cient if any numeric attributes are strongly correlated. Our approach o ers one solution to this problem. For each pair of numeric attributes with st...
متن کامل